Today we will…
Functions allow you to automate common tasks!
Writing functions has 3 big advantages over copy-paste:
Let’s define the function.
add_two <-The name of the function is chosen by the author.
The argument(s) of the function are chosen by the author.
If we supply a default value when defining the function, the argument is optional when calling the function.
something defaults to 2.{ }The body of the function is where the action happens.
return()Your function will give back what would normally print out…
When a function requires an input of a specific data type, check that the supplied argument is valid.
add_something <- function(x, something){
if(!is.numeric(x)){
stop("Please provide a numeric input for the x argument.")
}
return(x + something)
}
add_something(x = "statistics", something = 5)Error in add_something(x = "statistics", something = 5): Please provide a numeric input for the x argument.
add_something <- function(x, something){
if(!is.numeric(x) | !is.numeric(something)){
stop("Please provide numeric inputs for both arguments.")
}
return(x + something)
}
add_something(x = 2, something = "R")Error in add_something(x = 2, something = "R"): Please provide numeric inputs for both arguments.
The location (environment) in which we can find and access a variable is called its scope.
We cannot access variables created inside a function outside of the function.
Name masking occurs when an object in the function environment has the same name as an object in the global environment.
Functions look for objects FIRST in the function environment and SECOND in the global environment.
It is not good practice to rely on global environment objects inside a function!
You will make mistakes (create bugs) when coding.
print() debugging
print() statements throughout your code to make sure the values are what you expect.When you have a concept that you want to turn into a function…
Write a simple example of the code without the function framework.
Generalize the example by assigning variables.
Write the code into a function.
Call the function on the desired arguments
This structure allows you to address issues as you go.
Write a function called find_car_make() that takes in the name of a car and returns the “make” of the car (the company that created it).
find_car_make("Toyota Camry") should return “Toyota”.find_car_make("Ford Anglica") should return “Ford”.You will write several small functions, then use them to unscramble a message. Many of the functions have been started for you, but none of them are complete as is.
Today we will…
We wrote a function called find_car_make() that takes in the name of a car and returns the “make” of the car (the company that created it).
find_car_make("Toyota Camry") returns “Toyota”.find_car_make("Ford Anglica") returns “Ford”.dplyrConsider the mtcars data.
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Let’s use our new function:
mtcars |>
rownames_to_column("make_model") |>
mutate(make = find_car_make(make_model),
.after = make_model) |>
head(n = 3) make_model make mpg cyl disp hp drat wt qsec vs am gear carb
1 Mazda RX4 Mazda 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
2 Mazda RX4 Wag Mazda 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
3 Datsun 710 Datsun 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
We want to take in a vector of numbers and standardize it – make all values be between 0 and 1.
dplyrLet’s standardize penguin measurements.
Is it a good idea to scale (standardize) variables in a data analysis?
Why scale?
Why not scale?
E.g., a penguin with a bill length of 35 mm (std to 0.11) and a mass of 5500 g (std to 0.78).
Note
I used the existing function std_to_01() inside the new function for clarity!
Functions using unquoted variable names as arguments are said to use nonstandard evaluation or tidy evaluation.
Tidy evaluation isn’t naturally supported when writing your own functions.
When a piece of code is defused, R doesn’t return its value like normal.
We produce defused code when we use tidy evaluation and our own functions don’t know how to handle it.
Don’t use tidy evaluation in your own functions.
Use embrace injection.
rlang package provides the embrace operator ({{ }}) to simplify writing functions around tidyverse pipelines.{{ }} operator, you can transport a variable from one function to another and can get around defused code!std_column_01 <- function(data, variable) {
stopifnot(is.data.frame(data))
data <- data |>
mutate(variable = std_to_01(variable))
return(data)
}
std_column_01(penguins, body_mass_g)Error in `mutate()`:
ℹ In argument: `variable = std_to_01(variable)`.
Caused by error:
! object 'body_mass_g' not found
mutate() doesn’t know what body_mass_g is.body_mass_g variable using {{ }}!:=When we use the embrace operator, we also have to use the walrus operator.
:= is an alias of =.# A tibble: 344 × 7
species island bill_length_mm bill_depth_mm body_mass_g sex year
<fct> <fct> <dbl> <dbl> <dbl> <fct> <int>
1 Adelie Torgersen 39.1 18.7 0.292 male 2007
2 Adelie Torgersen 39.5 17.4 0.306 female 2007
3 Adelie Torgersen 40.3 18 0.153 female 2007
4 Adelie Torgersen NA NA NA <NA> 2007
5 Adelie Torgersen 36.7 19.3 0.208 female 2007
6 Adelie Torgersen 39.3 20.6 0.264 male 2007
7 Adelie Torgersen 38.9 17.8 0.257 female 2007
8 Adelie Torgersen 39.2 19.6 0.549 male 2007
9 Adelie Torgersen 34.1 18.1 0.215 <NA> 2007
10 Adelie Torgersen 42 20.2 0.431 <NA> 2007
# ℹ 334 more rows
What if I want to modify multiple columns?
across()!Consider a study of depression.
We implicitly assume observations are missing completely at random!
We need to take more care when dealing with missing values!
Challenge 7: Incorporating Multiple Inputs